Fovea本地化是眼科医学图像分析中最受欢迎的任务之一,其中Macula Lutea的中心点的坐标,即Fovea Centris,应基于彩色眼底图像计算。在这项工作中,我们将本地化问题视为分类任务,其中X和Y轴的坐标被认为是目标类。此外,软MAX激活功能和跨熵损失函数的组合被修改为其多尺度变化,以鼓励预测的坐标与地面真理密切相关。基于彩色眼底摄影图像,我们经验证明,所提出的MultiScale Softmax跨熵产生比Vanilla版本更好的性能,而不是Sigmoid激活的平均平方误差,这提供了一种新颖的坐标回归方法。
translated by 谷歌翻译
Reinforcement Learning (RL) is currently one of the most commonly used techniques for traffic signal control (TSC), which can adaptively adjusted traffic signal phase and duration according to real-time traffic data. However, a fully centralized RL approach is beset with difficulties in a multi-network scenario because of exponential growth in state-action space with increasing intersections. Multi-agent reinforcement learning (MARL) can overcome the high-dimension problem by employing the global control of each local RL agent, but it also brings new challenges, such as the failure of convergence caused by the non-stationary Markov Decision Process (MDP). In this paper, we introduce an off-policy nash deep Q-Network (OPNDQN) algorithm, which mitigates the weakness of both fully centralized and MARL approaches. The OPNDQN algorithm solves the problem that traditional algorithms cannot be used in large state-action space traffic models by utilizing a fictitious game approach at each iteration to find the nash equilibrium among neighboring intersections, from which no intersection has incentive to unilaterally deviate. One of main advantages of OPNDQN is to mitigate the non-stationarity of multi-agent Markov process because it considers the mutual influence among neighboring intersections by sharing their actions. On the other hand, for training a large traffic network, the convergence rate of OPNDQN is higher than that of existing MARL approaches because it does not incorporate all state information of each agent. We conduct an extensive experiments by using Simulation of Urban MObility simulator (SUMO), and show the dominant superiority of OPNDQN over several existing MARL approaches in terms of average queue length, episode training reward and average waiting time.
translated by 谷歌翻译
这是本文的第二部分,为异质变化检测(HCD)问题提供了新的策略,即从图形信号处理(GSP)的角度解决HCD。我们构造一个图表以表示每个图像的结构,并将每个图像视为图表上定义的图形信号。这样,我们可以将HCD问题转换为图表上定义的系统的信号响应的比较。在第一部分中,通过比较顶点域的图之间的结构差来衡量变化。在本第二部分中,我们分析了来自光谱域的HCD的GSP。我们首先分析同一图上不同图像的光谱特性,并表明它们的光谱表现出共同点和差异。特别是,正是变化导致了光谱的差异。然后,我们提出了HCD的回归模型,该模型将源信号分解为回归信号并更改信号,并且需要回归的信号具有与同一图上的目标信号相同的光谱属性。借助图光谱分析,提出的回归模型是灵活且可扩展的。对七个真实数据集进行的实验显示了该方法的有效性。
translated by 谷歌翻译
本文为异构变化检测(HCD)问题提供了一种新的策略:从图形信号处理(GSP)的角度解决HCD。我们为每个图像构造一个图表以捕获结构信息,并将每个图像视为图形信号。通过这种方式,我们将HCD转换为GSP问题:对两个图上定义的不同系统的响应的比较,试图找到结构性差异(第I部分)和信号差异(第II部分)异质图像之间的变化。在第一部分中,我们用顶点域的GSP分析了HCD。我们首先证明,对于未改变的图像,它们的结构是一致的,然后在两个图上定义的系统上的相同信号的输出相似。但是,一旦区域发生变化,图像的局部结构会发生变化,即包含该区域的顶点的连通性发生变化。然后,我们可以比较通过在两个图上定义的过滤器的相同输入图信号的输出信号以检测更改。我们设计了来自顶点域的不同过滤器,可以灵活地探索原始图中隐藏的高阶邻域信息。我们还从信号传播的角度分析了变化区域对变化检测结果的有害影响。在七个真实数据集上进行的实验显示了基于顶点域滤波的HCD方法的有效性。
translated by 谷歌翻译
多尺度体系结构和注意力模块在许多基于深度学习的图像脱落方法中都显示出有效性。但是,将这两个组件手动设计和集成到神经网络中需要大量的劳动力和广泛的专业知识。在本文中,高性能多尺度的细心神经体系结构搜索(MANAS)框架是技术开发的。所提出的方法为图像脱落任务的最爱的多个灵活模块制定了新的多尺度注意搜索空间。在搜索空间下,建立了多尺度的细胞,该单元被进一步用于构建功能强大的图像脱落网络。通过基于梯度的搜索算法自动搜索脱毛网络的内部多尺度架构,该算法在某种程度上避免了手动设计的艰巨过程。此外,为了获得强大的图像脱落模型,还提出了一种实用有效的多到一对训练策略,以允许去磨损网络从具有相同背景场景的多个雨天图像中获取足够的背景信息,与此同时,共同优化了包括外部损失,内部损失,建筑正则损失和模型复杂性损失在内的多个损失功能,以实现可靠的损伤性能和可控的模型复杂性。对合成和逼真的雨图像以及下游视觉应用(即反对检测和分割)的广泛实验结果始终证明了我们提出的方法的优越性。
translated by 谷歌翻译
在本文中,我们提出了一种基于深度学习的模型来检测北半球的乌斯多利飓风(ETCS),同时开发一种处理图像的新颖工作流程并为ETCS产生标签。我们首先通过从Bonfanti et.al调整一种方法来标记旋风中心。 [1]并建立三类标签等标准:发展,成熟和下降阶段。然后,我们提出了一个标签和预处理数据集中的图像的框架。一旦图像和标签准备好用作输入,我们创建了指定单拍摄检测器(SSD)的对象检测模型以适应我们数据集的格式。我们用两个设置(二进制和多字符分类)的标签数据集培训并评估我们的模型,同时保留结果记录。最后,我们实现了较高的性能,检测成熟阶段(平均平均精度为86.64%),以及检测所有三类的等等的可接受结果(平均平均精度79.34%)。我们得出结论,单次探测器模型可以成功地检测不同阶段的等等,并且在其他相关设置中的ETC检测的未来应用中表现出很大的潜力。
translated by 谷歌翻译
A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.
translated by 谷歌翻译
Weakly-supervised object localization aims to indicate the category as well as the scope of an object in an image given only the image-level labels. Most of the existing works are based on Class Activation Mapping (CAM) and endeavor to enlarge the discriminative area inside the activation map to perceive the whole object, yet ignore the co-occurrence confounder of the object and context (e.g., fish and water), which makes the model inspection hard to distinguish object boundaries. Besides, the use of CAM also brings a dilemma problem that the classification and localization always suffer from a performance gap and can not reach their highest accuracy simultaneously. In this paper, we propose a casual knowledge distillation method, dubbed KD-CI-CAM, to address these two under-explored issues in one go. More specifically, we tackle the co-occurrence context confounder problem via causal intervention (CI), which explores the causalities among image features, contexts, and categories to eliminate the biased object-context entanglement in the class activation maps. Based on the de-biased object feature, we additionally propose a multi-teacher causal distillation framework to balance the absorption of classification knowledge and localization knowledge during model training. Extensive experiments on several benchmarks demonstrate the effectiveness of KD-CI-CAM in learning clear object boundaries from confounding contexts and addressing the dilemma problem between classification and localization performance.
translated by 谷歌翻译
Witnessing the impressive achievements of pre-training techniques on large-scale data in the field of computer vision and natural language processing, we wonder whether this idea could be adapted in a grab-and-go spirit, and mitigate the sample inefficiency problem for visuomotor driving. Given the highly dynamic and variant nature of the input, the visuomotor driving task inherently lacks view and translation invariance, and the visual input contains massive irrelevant information for decision making, resulting in predominant pre-training approaches from general vision less suitable for the autonomous driving task. To this end, we propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. The proposed PPGeo is performed in two stages to support effective self-supervised training. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only. As such, the pre-trained visual encoder is equipped with rich driving policy related representations and thereby competent for multiple visuomotor driving tasks. Extensive experiments covering a wide span of challenging scenarios have demonstrated the superiority of our proposed approach, where improvements range from 2% to even over 100% with very limited data. Code and models will be available at https://github.com/OpenDriveLab/PPGeo.
translated by 谷歌翻译
In this work, we focus on instance-level open vocabulary segmentation, intending to expand a segmenter for instance-wise novel categories without mask annotations. We investigate a simple yet effective framework with the help of image captions, focusing on exploiting thousands of object nouns in captions to discover instances of novel classes. Rather than adopting pretrained caption models or using massive caption datasets with complex pipelines, we propose an end-to-end solution from two aspects: caption grounding and caption generation. In particular, we devise a joint Caption Grounding and Generation (CGG) framework based on a Mask Transformer baseline. The framework has a novel grounding loss that performs explicit and implicit multi-modal feature alignments. We further design a lightweight caption generation head to allow for additional caption supervision. We find that grounding and generation complement each other, significantly enhancing the segmentation performance for novel categories. We conduct extensive experiments on the COCO dataset with two settings: Open Vocabulary Instance Segmentation (OVIS) and Open Set Panoptic Segmentation (OSPS). The results demonstrate the superiority of our CGG framework over previous OVIS methods, achieving a large improvement of 6.8% mAP on novel classes without extra caption data. Our method also achieves over 15% PQ improvements for novel classes on the OSPS benchmark under various settings.
translated by 谷歌翻译